Prior Art Search and Its Evaluation
نویسندگان
چکیده
Prior Art Search is an information seeking task where searchers, for instance patent examiners, search for published literature to determine whether the claimed invention in a patent application is novel. In Prior Art Search, search tasks are often timesensitive and consist of rich information needs with multiple aspects/subtopics. In this thesis, we explore information retrieval techniques and evaluation metrics for prior art search. The work consists of three parts. Given a patent application document, we rst manage to retrieve relevant patent documents by performing retrieval at both the document level and the passage level. At the document level, we focus on automatically formulating e ective search queries. The queries are formed by extracting terms from claims, titles and hyphenating phrases in a patent application and re ned based on Inverse Document Frequency (IDF) and Part-of-Speech (POS) tagging. At the passage level, we propose a TF-IDF-based retrieval algorithm to calculate the relevance score for each passage and select the most relevant passages. Second, we propose a novel evaluation metric for prior art search. The new evaluation metric, termed the Cube Test (CT), is based on the proposed conceptual user utility model the Water Filling Model, which describes the process of prior art search. We compare our metric with existing prior art search evaluation metrics, as well as existing Web search evaluation metrics in correlation and discriminative power. Experiment results show that our metric e ectively captures the characteristics of prior art search.
منابع مشابه
A Methodology for Building a Patent Test Collection for Prior Art Search
This paper proposes a methodology for the construction of a patent test collection for the task of prior art search. Key to the justification of the methodology is an analysis of the nature and structure of patent documents and the patenting process. These factors enable a corpus of patent documents to be reverse engineered in order to arrive at high quality, realistic, relevance assessments. T...
متن کاملCLEF-IP 2011 Working Notes: Utilizing Prior Art Candidate Search Results for Refined IPC Classification
For the refined IPC classification in the CLEF-IP 2011 task, we constructed classification system with KNN classification which uses PAC (Prior Art Candidate) search results as neighbors. We also slightly modified the neighborhood evaluation. We also furnished a simple PAC search system. We produced some running results both in PAC search and classification, and evaluated our system. Our test s...
متن کاملAutomatic Prior Art Searching and Patent Encoding at CLEF-IP '10
In the intellectual property field two tasks are of high relevance: prior art searching and patent classification. Prior art search is fundamental for many strategic issues such as patent granting, freedom to operate and opposition. Accurate classification of patent documents according to the IPC code system is vital for the interoperability between different patent offices and for the prior ar...
متن کاملAutomatic Learning of A Supervised Classifier for Patent Prior Art Retrieval
Prior art retrieval is the process of determining a set of possibly relevant prior arts for a specific patent or patent application. Such process is essential for various patent practices, e.g. patentability search, validity search, and infringement search. To support the automatic retrieval of prior arts, existing studies generally adopt the traditional information retrieval (IR) approach or e...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014